Detection of Complex Networks Modularity by Dynamical Clustering 
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Based on cluster de-synchronization properties of phase oscillators, we introduce an efficient 
method for the detection and identification of modules in complex networks. The performance 
of the algorithm is tested on computer generated and real-world networks whose modular structure 
is already known or has been studied by means of other methods. The algorithm attains a high 
level of precision, especially when the modular units are very mixed and hardly detectable by the 
other methods, with a computational effort O(KN) on a generic graph with N nodes and K links. 
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Hierarchical modular structures constitute one of the 
most important property of real- world networked systems 
1]. For instance, tightly connected groups of nodes in a 
social network represent individuals belonging to social 
communities, while modules in cellular and genetic net- 
works are somehow related to functional modules. Con- 
sequently, identifying the modular structure of a com- 
plex network is a crucial issue in the analysis and un- 
derstanding of the growth mechanisms and the processes 
running on top of such networks. Modules (called some- 
times community structures in social science) are tightly 
connected subgraphs of a network, i.e. subsets of nodes 
within which the network connections are dense, and be- 
tween which connections are sparser. Nodes, indeed, be- 
longing to such tight-knit modules, constitute units that 
separately contribute to the collective functioning of the 
network. For instance, subgroups in social networks of- 
ten have their own norms, orientations and subcultures, 
sometimes running counter to the official culture, and are 
the most important source of a person's identity. Analo- 
gously, the presence of subgroups in biological and tech- 
nological networks is at the basis of their functioning. 

The detection of the modular structure of a network 
is formally equivalent to the classical graph partitioning 
problem in computer science, that finds many practical 
applications such as load balancing in parallel computa- 
tion, circuit partitioning and telephone network design, 
and is known to be a NP-complete problem Q. There- 
fore, although modules detection in large graphs is po- 
tentially very relevant and useful, so far this trial has 
been seriously hampered by the large associated compu- 
tational demand. To overcome this limitation, a series 
of efficient heuristic methods has been proposed over the 
years. Examples include methods based on spectral anal- 
ysis 0], or the hierarchical clustering methods developed 
in the context of social networks analysis 0], or meth- 
ods allowing for multi-community membership [j| . More 
recently, Girvan and Newman (GN) have proposed an 



algorithm which works quite well for general cases [(|. 
The GN is an iterative divisive method based in finding 
and removing progressively the edges with the largest be- 
tweenness, until the network breaks up into components. 
The betweenness bij of an edge connecting nodes i and j 
is defined as the number of shortest paths between pairs 
of nodes that run through that edge As the few 
edges lying between modules are expected to be those 
with the highest betweenness, by removing them recur- 
sively a separation of the network into its communities 
can be found. Therefore, the GN algorithm produces a 
hierarchy of subdivisions of a network of N nodes, from 
a single component to N isolated nodes. In order to 
know which of the divisions is the best one, Girvan and 
Newman have proposed to look at the maximum of the 
modularity Q, a quantity measuring the degree of corre- 
lation between the probability of having an edge joining 
two sites and the fact that the sites belong to the same 
modular unit (see Ref. Q for the mathematical definition 
of Q). The GN algorithm runs in 0(K 2 N) time on an 
arbitrary graph with K edges and N vertices, or 0(N 3 ) 
time on a sparse graph. In fact, calculating the between- 
ness for all edges requires a time of the order of KN 
, since it corresponds to the evaluation of all-shortest- 
paths (ASP) problem. And the betweenness for all edges 
need to be recalculated every time after an edge has been 
removed (the betweenness recalculation is a fundamen- 
tal aspect of the GN algorithm) |6:]. This restricts the 
applications to networks of at most a few thousands of 
vertices with current hardware facilities. Successively, a 
series of faster methods directly based on the optimiza- 
tion of Q have been proposed [H, Q , which allow up to a 
0(AHog 2 N) running time for finding modules in sparse 
graphs. 

All the above mentioned methods are based on the 
structure of a network, meaning that they use solely the 
information contained in the adjacency matrix A = {ay} 
(or any equivalent representation of the topology) of the 
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graph. As complementary to such approaches, the au- 
thors of Ref. [lfj have proposed a method to find mod- 
ules based on the statistical properties of a system of spins 
(namely q-state Potts spins) associated to the nodes of 
the graphs. In this Letter we propose a dynamical clus- 
tering (DC) method based on the properties of a dynam- 
ical system associated to the graph. DC techniques were 
initiated by the relevant observation that topological hi- 
erarchies can be associated to dynamical time scales in 
the transient of a synchronization process of coupled os- 
cillators Although being fast, so far DC methods do 
not provide in general the same accuracy in the identifi- 
cation of the communities. 

Here, we show how to combine topological and dy- 
namical information in order to devise a DC algorithm 
that is able to identify the modular structure of a graph 
with a precision competitive with the best techniques 
based solely on the topology. The method we present is 
based upon the well-known phenomenon of synchroniza- 
tion clusters of non identical phase oscillators each 
one associated to a node, and interacting through the 
edges of the graph. Clusters of synchronized oscillators 
represent an intermediate regime between global phase 
locking and full absence of synchronization, thus imply- 
ing a division of the whole graphs into groups of elements 
which oscillate at the same (average) frequency. The key 
idea is that, starting from a fully synchronized state of 
the network, a dynamical change in the weights of the 
interactions, that retain information on the original be- 
tweenness distribution, yields a progressive hierarchical 
clustering that fully detects modular structures. 

For the sake of illustration and without lack of 
generality, we specify our technique with reference to 
the so called Opinion Changing Rate (OCR) model, a 
continuous-time system of coupled phase oscillators in- 
troduced for the modeling of opinion consensus in social 
networks [l3j |. and representing a variation of the Ku- 
ramoto model 14j . Other continuos-time (Kuramoto and 
Rossler dynamics) , and also discrete-time (coupled circle 
maps) dynamical systems have been investigated and will 
be reported elsewhere. Given a undirected, unweighted 
graph with N nodes and K edges, described by the ad- 
jacency matrix A = {(% }, we associate to each node i 
(i = 1, . . . , N) a dynamical variable Xj(i) e] — oo, +oo[. 
The dynamics of each node is governed by: 



Xi(t) 



— (3\xj —Xi | 



(i) 

where u>i is the natural frequency of node i (in the nu- 
merical simulations the Wj's are randomly sorted from a 
uniform distribution between u) m i n — and ui max = 1), 
a is the coupling strength, and A/j is the set of nodes 
adjacent to i, i.e. all nodes j for which <Zjj = dji = 1. 
The constant parameter /?, tuning the exponential factor 
in the coupling term of Eqs. Jl}, switches off the inter- 



action when the phase distance between two oscillators 
exceeds a certain threshold (as usual [l3[ we fix (3 = 3). 
Notice that the interaction between two adjacent nodes 
i and j is weighted by the term b°j / J^jeAfi > where 
bij is the betweenness of the edge and a(t) is a time 
dependent exponent, such that a(0) = 0. In Ref.[l5| it 
has been shown that the ability of a dynamical network, 
as the one in Eqs. ([I]), to maintain a synchronization 
state crucially depends on the value of the parameter a. 
For such a reason, in the DC algorithm to find modular 
structures, we fix the coupling strength a equal to an ar- 
bitrary value such that the unweighted (a — 0) network 
is fully synchronized, and we solve Eqs. with a pro- 
gressively (stepwise) decreasing value of a(t). Namely, 
while keeping fixed er, we consider a(ti+\)ot(t{) — 5a for 
ti + i > t > ti, where £;+i — t\ = T V? (in the following 
T = 2), and 5a is a parameter that will be specified be- 
low. As the edges connecting nodes belonging to the same 
module (to two different modules) have in general small 
(large) betweenness, when a decreases from zero, the cor- 
responding interaction strengths on those edges become 
increasingly enhanced (weakened). Since the network is 
prepared to be fully synchronized, it has to be expected 
that, as a decreases, the original synchronization state hi- 
erarchically splits into clusters of synchronized elements, 
accordingly to the hierarchy of modules present in the 
graph. The individuation of synchronization clusters is 
made in terms of groups of nodes with the same instan- 
taneous frequency x(t). The procedure consists then in 
monitoring the emerging set of synchronization clusters 
at each value of a(t). The best division in communities 
of the graph (the best a value) is individuated by looking 
at the maximum (as a function of a) of the modularity 

Q !• 

In order to comparatively evaluate the performance of 
the algorithm, we have considered, as in Ref. Q, a set of 
computer generated random graphs constructed in such a 
way to have a well defined modular structure. All graphs 
are generated with N = 128 nodes and K — 1024 edges. 
The nodes are divided into four communities, contain- 
ing 32 nodes each. Pairs of nodes belonging to the same 
module (to different modules) are linked with probability 
Pin (Pout)- Pout is taken so that the average number z out 
of edges a node forms with members of other communi- 
ties can be controlled. While z ou t can be then varied, pi„ 
is chosen so as to maintain a constant total average node 
degree < k >= 16. As z ou t increases, the modular struc- 
ture of the network becomes therefore weaker and harder 
to identify. As the real modular structure is here directly 
imposed by the generation process, the accuracy of the 
identification method can be assessed by monitoring the 
fraction p of correctly classified nodes vs. z out . In Fig. Q] 
we report the value of p (averaged over twenty different 
realizations of the computer generated graphs and of the 
initial conditions) as a function of Zout, for the DC algo- 
rithm based on the OCR model of Eqs. (JTJ) , with a = 5.0 
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FIG. 1: Fraction p of correctly identified nodes as a function 
of z ou t (average number of inter-modular edges per node) for 
computer generated graphs with N = 128 nodes, four com- 
munities and an average degree < k >= 16. The results of 
DC methods based respectively on the OCR (open circles) 
and the OCR-HK (full circles) models, are compared with 
three standard methods, two based solely on the topology, 
such as the GN algorithm (full triangles) [|| and the New- 
man Q-optimization fast algorithm (full squares) and one 
based on an evolutionary method, the Simulated Annealing 
algorithm [l7[ (open squares). 



and 5a = 0.1. The resulting performance (open circles) 
is comparable to that of the best methods based solely 
on the topology, such as the GN (full triangles) @] and 
the Newman Q-optimization fast algorithm (full squares) 

1- 

The performance of the DC algorithm considered can 
be made better by adding a simple modification to the 
OCR model which further stabilizes the system. Such 
modification consists in changing in time the natural fre- 
quencies cj's according to the idea of confidence bound 
introduced for the first time by Hegselmann and Krause 
(HK), in the context of models for opinion formation [lij ]. 
Therefore, we will refer to the improved method as the 
OCR — HK dynamical clustering. The confidence bound 
is a parameter e which fixes the range of compatibility of 
the nodes. At each time step, the generic node i, having 
a dynamical variable Xi(t) and a natural frequency 0Ji(t), 
checks how many of its neighbors j are compatible, i.e. 
have a value of the variable Xj (t) falling inside the confi- 
dence range [xj — e, + e]. Then, at the following step 
in the numerical integration, we set u>i(t + At), i.e. the 
node takes the average value of the w's of its compatible 
neighbors at time t. In the OCR — HK, the changes of 
the w(i)'s is superimposed to the main dynamical evo- 
lution of Eq.([T]) and noticeably contributes to stabilize 
the frequencies of the oscillators according to the correct 
modular structure of the network, also reducing the de- 
pendence of the algorithm on the initial conditions. The 



results with computer generated graphs (5a = 0.1) are 
reported in Fig. Q] as full circles. The improvement in 
the performance of the OCR-HK method with respect to 
both the standard OCR and the two topological meth- 
ods (GN and Q-optimization), is evident for z out > 5, 
and it can be made even larger using for smaller val- 
ues of 5a. For completeness, we also report the results 
of an optimization procedure based on a Simulated An- 
nealing (SA) [13], which is presently the most accurate 
method available on the market, though very CPU time 
consuming. As for the computational cost, our algorithm 
provides an improvement with respect to the majority of 
other methods (see Table 1 in Ref.[l7[)- For instance, 
while iterative topological algorithms g, 0] need to re- 
calculate the betweenness distribution all the times a 
given edge is removed, in our case that distribution has 
to be evaluated only for the initial graph, as the clus- 
ter de-synchronization process itself gives a progressive 
weakening of the edges with highest betweenness. We 
have analyzed sparse graphs of size up to N=16,384 and 
found a scaling law of 0(N 1 - 76 ) for the dynamical evo- 
lution of the OCR-HK system. However, since the ini- 
tial calculation of betweenness takes 0(N 2 ) operations, 
the total CPU time scales as 0(N 2 ) too. Considering 
that the fastest algorithm on the market (0(iVlog 2 N)) 
is the Q-optimization one 0], which on the other hand 
is less accurate than the OCR-HK (as shown in Fig. [1]), 
we can conclude that our DC method provides an ex- 
cellent compromise between accuracy and computational 



demand [17]. 



It should be noticed that the proposed method con- 
ceptually differs from the dynamical cluste ring technique 
introduced in Ref. [ll[. Indeed, while in 11] the mod- 
ular hierarchy of a network was associated to different 
time scales in the transient dynamics toward a fully syn- 
chronized dynamics, here it corresponds to a hierarchical 
sequence of asymptotically synchronized states, from the 
initial (a(0) = 0) full network synchronization, to pro- 
gressive cluster synchronization as a(t) decreases. A rel- 
evant consequence is that our technique is almost fully 
insensitive to differences in the initial conditions for the 
phases of the coupled oscillators, as far as the local dy- 
namics is selected to have a unique attractor. 

Finally, we tested how the method works on two typical 
real-world networks: the Zachary Karate Club network 
(JV = 34, K = 78) [lH and the food web of marine or- 
ganisms living in the Chesapeake Bay (N — 33, K = 71) 
[I9L I20I ] . In both cases we have some a-priori knowledge 
of the existing modules. In fact, the karate club network, 
is known to split into two smaller communities, whose de- 
tailed composition was reported by Zachary [l8j. Anal- 
ogously, the food web organisms contain a main separa- 
tion in two large communities, according to the distinc- 
tion between pelagic organisms (living near the surface 
or at middle depths) and benthic organisms (living near 
the bottom). As in the previous simulations, we first 
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FIG. 2: (Color online). The distribution of instantaneous 
frequencies and the correspondent modularity Q are reported 
as a function of a for the OCR-HK model. The application 
to the Karate Club network and to the Chesapeake Bay food 
web are shown in panel (a) and in panel (b) respectively. In 
both the simulations a = 5.0, 5a = 0.1 and e = 0.0005. 



8, 9, 10, 20, 24, 1, S> 7a 1, 12, 13, 21, 23, 33) in which, with 
respect to Refs.jg, |20j, the distinction between pelagic 
and benthic organisms is not only preserved but also 
improved. 

In conclusion, we have introduced an efficient al- 
gorithm for the detection and identification of modu- 
lar structures, that attains a very high precision, with 
a small associated computational effort that scales as 
0(N 2 ). Our method, therefore, could be of use for a reli- 
able modules detection in sizable networks (e.g. biologi- 
cal, neural networks), and can contribute to a better un- 
derstanding of the hierarchical functioning of networked 
systems in many physical, biological and technological 
cases. 

The Authors are indebted with A. Amann, F.T. Arec- 
chi, M. Chavez, R. Lopez-Ruiz and Y. Moreno for the 
many helpful discussions on the subject. M.I. acknowl- 
edges projects n. RFBR 05-02-19815-MF-a, 06-02-16499- 
A, 06-02-16596-A. V.L., A.P. and A.R. acknowledge fi- 
nancial support from the PRIN05-MIUR project "Dy- 
namics and Thermodynamics of Systems with Long- 
Range Interactions" . 



calculate the set of edge betweenness {bij} and then we 
integrate numerically Eqs.JTJ) with the HK modification 
on the w's, with 6a = 0.1 and a = 5.0 (the latter ensures 
again an initial fully synchronized state at a = 0). 

In Fig. [5] the N instantaneous frequencies and 
the modularity Q, are plotted as a function of a (i.e. 
as a function of time) for both the karate club, panel 
(a), and the food web network, panel (b). In panel 
(a) the best configuration, with Q ~ 0.40, is reached 
around —1.0 > a > —2.5 and yields a partition of 
the karate club network into three stable communi- 
ties which very well describe the real situation. The 
largest one, labelled with n.l in the figure (nodes 
9, 10, 29, 31, 15, 16, 19, 21, 23, 32, 33, 34, 24, 25, 26, 27, 28, 
30), fully corresponds to one of the two com- 
munities reported by Zachary, while the sum of 
the remaining two communities, labelled as n.2 
(nodes 1,2,3,4,8,12,13,14,18,20,22) and n.3 (nodes 
5, 6, 7, 11, 17), corresponds to the second Zachary's mod- 
ule of 16 elements. Notice that cluster n.3 represents a 
very well connected subset that is frequently recognized 
as a separated module also by other methods @. More- 
over, the value of the best modularity found is larger 
than that of the Zachary partition into two communities 
(Q ~ 0.37). Analogously good performance is obtained 
for the food web. In panel (b) the highest value of Q, 
namely Q ~ 0.42, is reached for —2.8 > a > —3.8, 
yielding a division of the food web into five communities 
(n.Lnodes 3, 14, 15, 16, 18, 19, 25, 26, 27, 28, 29; n.2:nodes 
22,30,31,32; n.3:nodes 5,6; n.4:nodes 4,17; n.5:nodes 
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